YQL
SELECT * FROM Internet




Developer Talk Token Unicorn
Derek Gathright
Derek Gathright




•   Engineer @ Yahoo
Derek Gathright




•   Engineer @ Yahoo
•   Tweet from @derek
Derek Gathright




•   Engineer @ Yahoo
•   Tweet from @derek
•   Blog @ http://derekville.net
Derek Gathright




•   Engineer @ Yahoo
•   Tweet from @derek
•   Blog @ http://derekville.net
•   Everything else @ http://derek.io
On December 4th, 1995
On December 4th, 1995
“Netscape and Sun announce JavaScript, a
cross-platform object scripting language”
   http://web.archive.org/web/20070916144913/http://wp.netscape.com/newsref/pr/newsrelease67.html
How big is the Web?
GINOR
MOUS!
The Internet: 1995
    http://www.jevans.com/pubnetmap.html
The Internet: 2003
      http://www.opte.org/maps/
The Internet: 2007
      http://xkcd.com/256/
The Internet: 2010
      http://xkcd.com/802/
It’s impossible to
measure the size of the
web because it is
constantly changing,
growing, and morphing.
Every second:
Twitter gets 600 new tweets
Every minute:
YouTube +35 hours of video
Every month: Facebook gets
   2.5 billion new photos
Yeah, lots of it is garbage
But there's still a ton of
interesting stuff out there.
So how do you access it,
   programmatically?
It is easy enough to
‘scrape’ the web (using
 cURL, wget, etc...), but
  how do you parse it?
It is easy enough to
‘scrape’ the web (using
 cURL, wget, etc...), but
  how do you parse it?
XPath + DOM Traversal = Yay!
It is easy enough to
  ‘scrape’ the web (using
   cURL, wget, etc...), but
    how do you parse it?
 XPath + DOM Traversal = Yay!
Regular Expressions = Double Yay!
It is easy enough to
‘scrape’ the web (using
 cURL, wget, etc...), but
  how do you parse it?
“If a regular expression
is longer than 2 inches,
 find another method”
       - Douglas Crockford
4.4 lbs &1,368 pages.
     No thanks!
The Point?

The Web has a ton of
data, but no easy,
hackday-able way to
access it.
THE WEB
 NEEDS
 AN API!
APIs are awesome.
You get (mostly) whatever data
you want in (mostly) whatever
format you want and in a (mostly)
easy to parse structure.
Example...
http://api.twitter.com/1/users/
show.json?screen_name=derek
Companies discovered that if you build
 APIs, developers will come in droves
And it saves you from having to
dance around on stage like a monkey
           (if you are confused, Google “monkey dance”)
Yahoo, Google,
Facebook, Twitter,
Microsoft, NY Times, ...

Most web companies
offer APIs now days.
As neat as they are,
they are imperfect.
Why?
As neat as they are,
they are imperfect.
Why?
You have to read
documentation to use
them.
As neat as they are,
they are imperfect.
Why?
You have to read
documentation to use
them.
BOOOOOOO!!!!
We're developers,
we're lazy,
and we want to build stuff,
       NOW!
Yahoo invented a
 solution to this
   problem...
YQL!
"Yahoo! Query Language is an
expressive SQL-like language that
lets you query, filter, and join data
across Web services.

With YQL, apps run faster with
fewer lines of code and a smaller
network footprint."
        http://developer.yahoo.com/yql/
YQL is...

• RESTful
• Scaleable
• Customizable
• ... and lots of other “ables”
How do you use it?
How do you use it?
$format = “json”; // or xml;
How do you use it?
$format = “json”; // or xml;
$base = “http://query.yahooapis.com/v1/public/yql”;
How do you use it?
$format = “json”; // or xml;
$base = “http://query.yahooapis.com/v1/public/yql”;
$url = “{$base}?q={$yql_query}&format={$format}”;
How do you use it?
$format = “json”; // or xml;
$base = “http://query.yahooapis.com/v1/public/yql”;
$url = “{$base}?q={$yql_query}&format={$format}”;
$json_string = goGetIt($url); // likely a curl() call
How do you use it?
$format = “json”; // or xml;
$base = “http://query.yahooapis.com/v1/public/yql”;
$url = “{$base}?q={$yql_query}&format={$format}”;
$json_string = goGetIt($url); // likely a curl() call
$data = json_decode($json);
How do you use it?
$format = “json”; // or xml;
$base = “http://query.yahooapis.com/v1/public/yql”;
$url = “{$base}?q={$yql_query}&format={$format}”;
$json_string = goGetIt($url); // likely a curl() call
$data = json_decode($json);

       Or use any of the libraries written for
        your favorite language/framework
YQL Queries

 SELECT {fields}
  FROM {table}
WHERE {conditions}
SELECT * FROM
weather.forecast
     WHERE
 location=90210
SELECT * FROM
  data.html.cssselect
       WHERE
url=“http://yahoo.com”
 AND css=“body a”;
SELECT height,width,url
 FROM search.images
       WHERE
 query=“kitteh” AND
mimetype LIKE “%jpeg%”
SELECT * FROM
 google.search
    WHERE
   q=“pizza”
SELECT status.text FROM
   twitter.user.timeline
         WHERE
  screen_name=“derek”
SELECT * FROM
  foursquare.history
       WHERE
username=“foo” AND
   password=“bar”
SELECT * FROM rss WHERE
 url IN (SELECT title FROM
   atom WHERE url="http://
   spreadsheets.google.com/feeds/list/
 pg_T0Mv3iBwIJoc82J1G8aQ/od6/public/
        basic")
            LIMIT 10 |
     unique(field="title")
Where’s the magic?
Data Tables
13 categories & counting, including...


• Geo                  • Social
• Flickr               • Upcoming
• Local                • Weather
• Maps                 • Yahoo (Search)
• Meme                 • YMail
• Music                • YQL (Storage)
Open Data Tables
900+ community contributed tables in
hundreds of categories, including...

 Amazon               • Netflix
• Craigslist          • NY Times
• Facebook            • SimpleGeo
• Foursquare          • SPARQL
• Google              • Twitter
• HackerNews          • Wordpress
• LastFM              • YouTube
google.search
google.search

http://www.datatables.org/google/google.search.xml
google.search

http://www.datatables.org/google/google.search.xml

               twitter.user.timeline
google.search

   http://www.datatables.org/google/google.search.xml

                   twitter.user.timeline

http://www.datatables.org/twitter/twitter.user.timeline.xml
google.search

   http://www.datatables.org/google/google.search.xml

                   twitter.user.timeline

http://www.datatables.org/twitter/twitter.user.timeline.xml


                    foursquare.history
google.search

   http://www.datatables.org/google/google.search.xml

                   twitter.user.timeline

http://www.datatables.org/twitter/twitter.user.timeline.xml


                    foursquare.history

    http://www.datatables.org/foursquare/history.xml
http://www.datatables.org/craigslist/craigslist.search.xml
http://www.datatables.org/craigslist/craigslist.search.xml
http://www.datatables.org/craigslist/craigslist.search.xml
http://www.datatables.org/craigslist/craigslist.search.xml
&query={query}




http://www.datatables.org/craigslist/craigslist.search.xml
YQL != Voodoo Magic




It is just rewriting a YQL
query into one (or many)
    HTTP calls for you.
USE "http://www.datatables.org/nyt/nyt.bestsellers.xml"
AS nyt.bestsellers;

USE "https://github.com/gcb/yql.opentable/raw/master/
text.concat.xml"
AS text.concat;

SELECT text FROM text.concat

 WHERE text.key1 = "http://www.amazon.com/dp/" AND

 (text.key2) IN (

 
 SELECT isbns.isbn.isbn10

 
 FROM nyt.bestsellers 

 
 WHERE apikey='yourAPIKey'

 );

// Generates strings like “http://www.amazon.com/dp/031603617X”
USE "https://github.com/gcb/yql.opentable/raw/master/
text.concat.xml"
AS text.concat;
https://github.com/gcb/yql.opentable/raw/master/text.concat.xml
https://github.com/gcb/yql.opentable/raw/master/text.concat.xml
<execute>

• Execute arbitrary JavaScript in Rhino (a JS engine)
• E4X Support (XML literals in JS)
• Speak protocols and handle authentication;
  Basic auth, OAuth, XAuth, XMLRPC, ...
• Best feature? View-source!
Summary
•   YQL is very useful for...

    •   Scraping

    •   Creating an API where one doesn’t exist

    •   Converting XML -> JSON, & vice-versa

    •   JSONP for JS-only apps

    •   Many HTTP requests -> single HTTP request

    •   Server-side JS processing
NY Times Data Tables
•   nyt.article.search        •   nyt.newswire

•   nyt.bestsellers           •   nyt.people.activities

•   nyt.bestsellers.history   •   nyt.people.followers

•   nyt.bestsellers.search    •   nyt.people.following

•   nyt.movies.critics        •   nyt.people.newsfeed

•   nyt.movies.picks          •   nyt.people.profiles

•   nyt.movies.reviews        •   nyt.people.users
Get started @
   http://developer.yahoo.com/yql



            Thanks!

Questions? Find, Tweet, or Email me.
 @derek or drg@yahoo-inc.com

YQL: Select * from Internet

Editor's Notes